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In this letter we revisit the problem of optimal design of quantum tomographic experiments. In 
contrast to previous approaches where an optimal set of measurements is decided in advance of the 
experiment, we allow for measurements to be adaptively and efficiently re-optimised depending on 
data collected so far. We develop an adaptive statistical framework based on Bayesian inference and 
Shannon's information, and demonstrate a ten-fold reduction in the total number of measurements 
required as compared to non-adaptive methods, including mutually unbiased bases. 
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Quantum tomography is a valuable tool in quantum in- 
formation processing, being essential for characterisation 
of quantum states, gates, and measurement equipment. 
Quantum state tomography (QST) aims to determine 
an unknown quantum state from the outcome of mea- 
surements performed on an ensemble of identically pre- 
pared systems. Measurements in quantum systems are 
non-deterministic, hence QST is a classical statistical esti- 
mation problem. Full tomography is inherently resource- 
intensive: even in moderately sized systems, the number 
of measurements required is often prohibitive. There is 
a need for methods that allow for shorter experiments. 
Optimal experiment design (OED) aims to achieve this 
by selecting cleverly which measurements to use during 
the experiment. 

Most existing approaches to OED determine, prior to 
collecting data, an optimal set of measurements to be 
used throughout the experiment. In this sense, whenever 
they exist, mutually unbiased bases (MUBs) are known 
to be optimal [1, 2]. Research since has focused mainly on 
proving or disproving existence of, and implement MUBs 
in various dimensions [2-4]. Other work, [5, 6] considered 
OED based on the Cramer-Rao bound. Here we argue that 
these approaches, including MUBs, provide only a partial 
solution to the problem of optimal experiment design 
inasmuch as they do not take partial data into account. 
If we are allowed to revise our choice of measurements 
during the experiment based on data collected so far, we 
may be in a better position to reduce redundancy. This 
strategy is generally known as active learning or adaptive 
sampling. In physics, this approach has been referred 
to as self-learning measurements [7, 8]. However, due 
to the expensive computations that are involved, these 
methods have been restricted to two dimensional pure 
quantum states, or very few measurements. Recently 
advances in Bayesian methods allow us to build a fast, 
online algorithm that allows self-learning in arbitrary 
dimensions with many measurements. 

Here we propose a new algorithmic framework that we 
call Adaptive Bayesian Quantum Tomography (ABQT), 
that builds on full Bayesian inference and Shannon in- 
formation. To achieve adaptivity in practice, we need a 
fast algorithm for performing Bayesian state reconstruc- 
tion from partial data after each measurement. Current 
sampling methods such as in [9] are inappropriate as 
their costs increase with the number of measurement 
configurations tried so far. As a solution, we present a 



sequential importance sampling scheme [10], that does not 
suffer from this. We then use the developed algorithm 
in conjunction with an information theoretic objective to 
adaptively optimise measurements. We assess the rela- 
tive performance of our adaptive method in Monte Carlo 
simulations of qubit systems, and demonstrate a ten- fold 
reduction in the number of measurements needed for full 
tomography of two-qubit pure states. We also investigate 
the trade off between entangling and separable measure- 
ments in multipartite systems. Our central finding is that 
via adaptive tomography one can achieve, and even sur- 
pass, the statistical efficiency of MUB tomography using 
only separable measurements, that require experimen- 
tal apparatus that is substantially easier to build using 
current technology. 

Quantum state tomography involves determining from 
experimental data the quantum state, p, of a system by 
performing measurements on several identical copies. For 
a D-dimensional system (D — 2 m for m-qubit systems), 
p is an D x D complex- valued density matrix, p has to 
be Hermitian and have unit trace, so D 2 — 1 real degrees 
of freedom must be estimated. The apparatus for a tomo- 
graphic experiment may be configured in several different 
ways; we use a <G A to index all accessible configurations. 
Each measurement configuration a is characterised by 
a positive operator- valued measure (POVM). For each 
configuration, a measurement results in observing one of a 
finite number, T, of distinguishable outcomes. A POVM 
is defined by a set, M Q , of Hermitian operators M ai , 
indexed by possible outcomes 7 € {1, . . . , T}, satisfying 
J2 7= i M ai = I. These POVMs jointly constitute our 
tomographic model A4 = {M a : a e A} and determine 
the probability of observing outcome 7 in configuration a 
when the measured system is in state p via Born's rule: 



¥(j\p,a;M) = tr{M a7 p} 

State reconstruction has been approached with several 
methods, the most popular being maximum likelihood 
estimation (MLE). MLE finds a physically feasible state 
p that is most likely to have produced the observed data, 
T>, by maximising the likelihood: 
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where c Q7 is the number of times outcome 7 was ob- 
served in configuration a. All probabilities are conditional 
on Ai, for brevity this is omitted. A well-known drawback 
of MLE is that it often yields rank-deficient estimates, 
and thus assigns zero predictive probability to certain ob- 
servations [9] . This seems an unreasonable conclusion on 
the basis of a finite sample. Additionally, MLE provides 
no measure of uncertainty in its point estimate. 

More sophisticated methods for quantum tomography 
use Bayesian inference and suffer from neither of these 
problems [9, and refs.]. In Bayesian inference a prior 
probability density, p(p), over feasible states is specified. 
This prior is then augmented with the likelihood from 
Eqn. (1) using Bayes' rule to yield a posterior distribution: 



p(p\T>) cc C(p;T>)p(p) 



(2) 



the posterior: 
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Using this approximation, and Bayes' rule, one can 
derive an approximation to the next posterior, after ob- 
serving a new outcome 7«+i in configuration a n +i, as: 
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Should we want a point estimate, we may report, say, 
the Bayesian mean estimate (BME) which is known to 
maximise expected operational divergences [9, 11]. But 
importantly, Bayesian inference also provides error bars, 
and more: the posterior captures richly our remaining 
uncertainty in the true state having seen the data T>. 

For Bayesian inference one has to provide the prior p(p), 
which is typically chosen to be non-informative or uniform. 
Here we adopt the representation and prior introduced 
in [9] , that treats our original system of interest as part of 
a larger, D x K dimensional bipartite system. Our prior 
over the mixed state p is then defined as the measure 
induced by the uniform (Haar) measure over pure states 
in D x K dimensions. It is easy to see that, tracing out 
the K dimensional ancillary part leaves us with a rank- if 
mixed state p. Thus, by tuning this parameter we can 
trade off between computational efficiency and estimation 
accuracy, in a similar manner to compressed sensing [12]. 

Unfortunately, normalisation of the posterior distri- 
bution (Eqn. (2)) becomes analytically intractable, and 
therefore we have to approximate it, usually via Markov 
chain Monte Carlo (MCMC) methods. Several MCMC 
approaches have been suggested in this context [9, and 
refs. therein]. These methods require evaulation of the 
full likelihood (1), which has 0(n) cost with the number 
of different configurations used so far. This is undesirable 
for adaptive tomography, where inference has to be per- 
formed after each measurement. To address this problem 
we developed a fast sequential importance sampling (SIS) 
algorithm, with 0(1) likelihood evaluation cost. As we 
are not aware of this approach being used in the con- 
text of QST, we briefly explain the basic version below. 
The interested reader is referred to [10] for a thorough 



overview. 



In SIS, one keeps track of a number of samples, of- 
ten called particles, p s , (s = 1 . . . S) and corresponding 
weights w s , {J2 s w s = -0 w hi cn are updated sequentially, 
every time a new measurement is made. Assume that 
after n measurements, having observed data T> n , the par- 
ticles and weights tof constitute an approximation to 



The new weights w s n are the renormalised product 

(n) 

of our current weights w s and observation probabilities 
P{ln+i\Ps,<y- n+ i). This update is fast, and only requires 
computing one term of the full likelihood, thus its com- 
plexity is independent of how many configurations have 
been tried before. This computational efficiency comes at 
a price; as time progresses, several weights decay to almost 
zero, and thus the quality of our approximation drops. 
This issue can be detected and handled by monitoring the 
effective sample size and resampling appropriately [10]. 

Having discussed our method for estimating the state 
based on partial data, we now turn to the problem of 
optimal experiment design. Different state determination 
schemes have different OED strategies associated with 
them. Maximum likelihood methods usually use some 
form of the Cramer- Rao bound [5, 6] . Bayesian experiment 
design on the other hand is based on Shannon informa- 
tion [1, 13]. The posterior characterises our remaining 
uncertainty in the parameter, and this uncertainty can 
be quantified using Shannon's entropy. A sensible aim 
is to pick an experimental configuration a, such that af- 
ter observing the outcome 7, the entropy H of the new 
posterior is reduced the most: 



argmax {H [p{p\V)\ - E ph]a . v) [H [p(p| 7l a, D)]]} (5) 

aeA 

The expectation with respect to 7 is needed as the mea- 
surement outcome is unknown a priori. This objective 
naturally allows us to address the question 'Having seen 
the outcome of the first few measurements, which measure- 
ment should we carry out next?' Rather, it was used to de- 
termine a single best set of measurements which are then 
uniformly sampled throughout the experiment [1, 13]. Un- 
der these circumstances mutually unbiased bases (MUBs) 
are optimal, whenever they exist. We exploit the de- 
pendence of Eqn. (5) on past observations, and allow for 
measurements to be re-optimised adaptively as the exper- 
iment progresses. 



However, Eqn. (5) is impractical to work with directly, 
as it involves computing entropies of high-dimensional in- 
tractable posterior densities. Recall that we approximate 
our posterior by samples, with which it is notoriously hard 
to estimate differential entropies. Furthermore, in Eqn. 
(5) the posterior has to be re-computed for every possible 
outcome 7. Therefore, instead of working with Eqn. (5) 
directly, we propose to use an equivalent reformulation 
thereof in terms of predictive distributions [13]: 



argmax{H[P( 7 |a,P)] -E p(p]v) [H [P( 7 |a,p)]]} (6) 

aeA 

In previous studies [7] the system is limited to pure 
single qubit states, calculating the intractable Bayesian 
normalising constant can be realised with simple numeri- 
cal integration; this could not be extended easily to higher 
dimensions. They consider two active learning algorithms: 
firstly, uncertainty sampling, which uses an approximate 
version of Eqn. (6), where the second term was ignored. 
This arguably leads to suboptimal selection behaviour; 
the experimenter's uncertainty may be confounded with 
inherent uncertainty of quantum measurements. The 
second seeks to minimize the Bayes Risk, using fidelity 
as the loss function; this requires a posterior update for 
evaluation of every measurement to be considered, ABQT 
requires only one update per complete cycle. Online com- 
putation is therefore infeasible, in [8] experimental designs 
for all 2 N possible experimental outcome successions are 
pre-computed, they are therefore limited to very short 
experiments (< 20 measurements). Combining Eqn. (6) 
with our SIS Bayesian update scheme allows for fast online 
experimental design. 

The equivalence between Eqns. (5) and (6) becomes 
clear realising that they both express the conditional 
mutual information between p and 7. Eqn. (6) offers 
computational advantages over Eqn. (5): it only involves 
computing discrete entropies H[P(7|a, /?)] and expecta- 
tions of these under the posterior. This objective function 
is generally non-convex in a, but its value - and derivatives 
with respect to a - can now be efficiently computed using 
our weighted posterior samples from Eqn. (3), allowing us 
to find the most informative a by direct optimisation. 

In summary, we propose the following algorithm, called 
Adaptive Bayesian Quantum Tomography. After each 
single measurement, ABQT updates its approximate pos- 
terior using Eqn. (4) , then chooses the next measurement 
by direct numerical maximisation of the information the- 
oretic objective in Eqn. (6). 

EX 1: single qubit tomography. In our first simu- 
lated experiments we study tomography of single qubits 
(D = 2). Mixed state qubits have three real degrees 
of freedom, p is represented as a point in a unit ball, 
called the Bloch sphere. For illustration purposes we first 
omit the third component, and only infer two remaining 
parameters, which will lie in a unit (Bloch) disk. This 
corresponds to e. g. determining linear polarisation of a 
photon, assuming that the circular polarisation is zero. 
We allow for arbitrary projective measurements with bi- 




FIG. 1. Adaptive selection of measurements based of partial 
data. Scatter plots show 400 samples from current posterior. 
Shaded circles around the 'Bloch disk' show relative value of 
the objective in Eqn. (6) for different measurement directions 
(lighter is higher). Pairs of arrows show the most informative 
next measurement. Circular histograms show the number of 
times measurement directions have been used, (a) Initially, no 
observations are made, samples shown are from the uniform 
prior. All measurements are equally informative, we chose 
to start with {\H) , \V}}. (b) After one measurement, the 
posterior is updated, the next best measurement is mutually 
unbiased w.r.t. the first one. It is now {\D) , \A)}. (c) After 
two observations, the next best measurement is equally biased 
to the first two bases, (d) Posterior after 1000 observations 
concentrates around true state. The method tries a range of 
measurements, with a tendency to point towards the solution. 
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FIG. 2. One qubit tomography using projective measurements. 
(a) Improvement of mean posterior fidelity as the experiment 
progresses. Results are shown for uniformly sampled mea- 
surements ( ), uniformly sampled Pauli measurements ( 

), ABQT selecting adaptively amongst the 3 Pauli mea- 
surements ( ) and ABQT picking general measurements 

( ). Adaptive optimisation of measurements allows for an 

almost n _1 rate of convergence, while other methods are more 
consistent with a n~ 5 rate, (b) Final value of the mean poste- 
rior infidelity after 6000 measurements using the four methods 
as before, as a function of purity of the state to be estimated. 
The advantage of ABQT is greatest for purer states. 
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FIG. 3. Two qubit QST with uniformly chosen amongst MUB ( ) or SSQT bases ( ) and ABQT picking from the same 

set of MUBs ( ), SSQT bases ( ) or a more flexible set of 81 separable bases ( ). Cases (a)-(c) are the same as those 

in [2], (d) shows average results over 20 randomly generated entangled pure states, (a) As expected, for the maximally mixed 
state the choice of measurement strategy has little effect, (b) On the entangled state (\HH) + \VV})/\/2 MUB outperforms 
SSQT when uniformly sampled, but by allowing for adaptivity we can close the performance gap. (c) SSQT outperforms MUBs 
on the separable state \HV), but again, picking measurements adaptively the two sets perform similarly, (d) For random pure 
states a large improvement in performance is made when performing ABQT with the flexible set of separable measurements. 
Using this set, ABQT only needs 10 4 measurements to achieve « 98.7% mean fidelity for which MUB needs 10 5 measurements. 



nary (T = 2) outcomes. These are represented by pairs of 
antipodal points on the perimeter of the Bloch disk. Now 
a £ [0, 7r) codes for the orientation. Fig. 1 shows the pro- 
gression of measurement bases chosen by ABQT. The first 
two measurements are mutually unbiased, however, the 
third measurement is equally biased with respect to both 
previous bases, demonstrating that using a fixed MUB 
set is suboptimal in the adaptive framework. Throughout 
the rest of the experiment the algorithm explores a wide 
range of measurements. 

Fig. 2 shows that the posterior mean fidelity - this time 
inferring all three coordinates in the full Bloch sphere - im- 
proves at a faster rate when measurements are adaptively 
optimised. We quantify performance as mean posterior 
fidelity, rather than the fidelity of the Bayesian mean esti- 
mate, as the latter gives no indication of the confidence in 



our estimate. The rate is more consistent with a 



law 



rather than n~ = as predicted for non- adaptive methods [2, 
and refs.]. Fig. 2.b shows a larger advantage for states of 
high purity (defined as sum of squared eigenvalues). 

EX 2: Separable vs. MUB tomography of two qubits. 
In multipartite systems, such as m-qubit registers, there 
are two fundamentally different classes of measurements 
one can apply: separable or entangling. Separable to- 
mographic experiments are straightforward and cheap to 
implement, while entangling measurements are statisti- 
cally more powerful. Notably, entanglement is required 
for implementing MUBs. These differences are discussed 
extensively in [2] . To investigate this trade-off in the light 
of adaptive tomography, we reproduce and extend the 
experiments in [2]. Results are shown in Fig. 3. Notably, 
all substantial differences between MUB and standard 
separable tomography (SSQT) vanish as we allow for 
adaptivity (Fig. 3.a-c). Furthermore, for random pure 
states, we are able to realise a ten-fold improvement over 
MUBs when using flexible separable measurements (Fig. 
3.d). The results indicate that allowing for adaptivity 
with an imperfect, but flexible set of measurements offers 
greater advantages than using a fixed set of MUBs. 



In summary, we have presented a new adaptive opti- 
mal experimental design framework and method based 
on Bayesian inference and Shannon's information. We 
showed that mutually unbiased bases, widely accepted 
as the optimal measurements, represent only a partial 
solution and are suboptimal in the adaptive framework. 
Moreover, the adaptive framework applies regardless of 
dimensionality, and can be applied to spaces where MUBs 
do not even exist [3, 13]. This motivates a shift in ex- 
perimental focus from implementing complex entangling 
measurements to implementing quickly reconfigurable 
simpler measurements. In quantum optics, this could 
be feasibly achieved via mechanically or electronically 
controlled liquid crystal wave plates. 
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